Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory allocations in the event processing pipeline #36830

Closed
wants to merge 2 commits into from

Conversation

rdner
Copy link
Member

@rdner rdner commented Oct 12, 2023

Previously, every time a processor ran on an event we made a clone of the entire event for two reasons:

  1. this event could have some nested maps that are shared among multiple events.

  2. in case a processor fails to make a change it should be able to revert its partial changes.

This change added a new EventEditor wrapper that is used for collecting pending event changes in processors with an option to Apply or Reset them.

Additionally, this EventEditor takes care of the efficient memory management when making changes to an event by cloning only the nested maps that processors access or modify. Most of the processors just put new keys or delete existing keys on the root-level, so most of the time the nested maps in the event remain untouched and it does not require the whole event to be cloned.

Processors migration progress:

BEFORE/AFTER benchmark results will be here once PR is finished

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

I added unit tests and a benchmark.

Related issues

@rdner rdner added enhancement libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-7.17 Automated backport to the 7.17 branch with mergify labels Oct 12, 2023
@rdner rdner self-assigned this Oct 12, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 12, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 12, 2023

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 7 min 59 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@rdner rdner force-pushed the processor-optimization branch from 19dc9e5 to 30965cf Compare October 13, 2023 15:20
Previously, every time a processor ran on an event we made a clone of
the entire event for two reasons:

1. this event could have some nested maps that are
shared among multiple events.

2. in case a processor fails to make a change it should be able to
revert its partial changes.

This change added a new `EventEditor` wrapper that is used for collecting
pending event changes in processors with an option to `Apply` or `Reset` them.

Additionally, this `EventEditor` takes care of the efficient memory management when
making changes to an event by cloning only the nested maps that
processors access or modify. Most of the processors just put new keys
or delete existing keys on the root-level, so most of the time the nested maps in the
event remain untouched and it does not require the whole event to be cloned.
@rdner rdner added the skip-ci Skip the build in the CI but linting label Oct 16, 2023
@rdner rdner force-pushed the processor-optimization branch 2 times, most recently from 7cdf193 to 8c636f6 Compare October 17, 2023 15:21
@amitkanfer
Copy link
Collaborator

can't wait for this to land. will have a great positive impact on performance!

@rdner rdner force-pushed the processor-optimization branch from 8c636f6 to b236659 Compare October 18, 2023 10:46
@rdner
Copy link
Member Author

rdner commented Oct 25, 2023

Closing in favour of #36958

@rdner rdner closed this Oct 25, 2023
@amitkanfer
Copy link
Collaborator

Closing in favour of #36958

decided to first deliver a first set of processors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify enhancement libbeat skip-ci Skip the build in the CI but linting Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants